10 research outputs found

    A NOVEL SEMANTIC SIMILARITY SCORE FOR PROTEIN DATA ANALYSIS

    Get PDF
    oai:ojs2.ctrj.in:article/1Aim: A similarity evaluation measure for Gene Ontology GO terms is developed. Results: The proposed method takes into account the semantics hidden in ontologies or the term level information content, membership of term, and topology-based similarity measures. The proposed method is evaluated on positive and negative dataset of UniProt, Protein family clans and the Pearson’s correlation with other existing methods. Conclusion: The experimental results exhibited a major supremacy of the proposed method over other semantic similarity measures. HIGHLIGHTS:1. An improved approach for semantic similarity evaluation for GO terms based on the information content and the topological factors is developed.2. The proposed method shows highest correlation for MF (Molecular Function) ontology

    Identification of monolingual and code-switch information from English-Kannada code-switch data

    Get PDF
    Code-switching is a very common occurrence in social media communication, predominantly found in multilingual countries like India. Using more than one language in communication is known as code-switching or code-mixing. Some of the important applications of code-switch are machine translation (MT), shallow parsing, dialog systems, and semantic parsing. Identifying code-switch and monolingual information is useful for better communication in online networking websites. In this paper, we performed a character level n-gram approach to identify monolingual and code-switch information from English-Kannada social media data. We paralleled various machine learning techniques such as naïve Bayes (NB), support vector classifier (SVC), logistic regression (LR) and neural network (NN) on English-Kannada code-switch (EKCS) data. From the proposed approach, it is observed that the character level n-gram approach provides 1.8% to 4.1% of improvement in terms of Accuracy and 1.6% to 3.8% of improvement in F1-score. Also observed that SVC and NN techniques are outperformed in terms of accuracy (97.9%) and F1-score (98%) with character level n-gram

    Data driven algorithm selection to predict agriculture commodities price

    Get PDF
    Price prediction and forecasting are common in the agriculture sector. The previous research shows that the advancement in prediction and forecasting algorithms will help farmers to get a better return for their produce. The selection of the best fitting algorithm for the given data set and the commodity is crucial. The historical experimental results show that the performance of the algorithms varies with the input data. Our main objective was to develop a model in which the best-performing prediction algorithm gets selected for the given data set. For the experiment, we have used seasonal autoregressive integrated moving average (SARIMA) stack ensemble and gradient boosting algorithms for the commodities Tomato and Potato with monthly and weekly average prices. The experimental results show that no algorithm is consistent with the given commodities and price data. Using the proposed model for the monthly forecasting and Tomato, stack ensemble is a better choice for Karnataka and Madhya Pradesh states with 59% and 61% accuracy. For Potatoes with the monthly price for Karnataka and Maharashtra, the stack ensemble model gave 60% and 85% accuracy. For weekly prediction, the accuracy of gradient boosting is better compared to other models

    Effective Prostate Cancer Detection using Enhanced Particle Swarm Optimization Algorithm with Random Forest on the Microarray Data

    Get PDF
    Prostate Cancer (PC) is the leading cause of mortality among males, therefore an effective system is required for identifying the sensitive bio-markers for early recognition. The objective of the research is to find the potential bio-markers for characterizing the dissimilar types of PC. In this article, the PC-related genes are acquired from the Gene Expression Omnibus (GEO) database. Then, gene selection is accomplished using enhanced Particle Swarm Optimization (PSO) to select the active genes, which are related to the PC. In the enhanced PSO algorithm, the interval-newton approach is included to keep the search space adaptive by varying the swarm diversity that helps to perform the local search significantly. The selected active genes are fed to the random forest classifier for the classification of PC (high and low-risk). As seen in the experimental investigation, the proposed model achieved an overall classification accuracy of 96.71%, which is better compared to the traditional models like naïve Bayes, support vector machine and neural network

    Sentiment Analysis Framework using Deep Active Learning for Smartphone Aspect Based Rating Prediction

    No full text
    Social media are a rich source of user generated content where people express their views towards the products and services they encounter. However, sentiment analysis using machine learning models are not easy to implement in a time and cost effective manner due to the requirement of expert human annotators to label the training data. The proposed approach uses a novel method to remove the neutral statements using a combination of lexicon based approach and human effort. This is followed by using a deep active learning model to perform sentiment analysis to reduce annotation efforts. It is compared with the baseline approach representing the neutral tweets also as a part of the data. Considering brands require aspect based ratings towards their products or services, the proposed approach also categorizes predicting ratings of each aspect of mobile device

    Spamming the mainstream: A survey on trending Twitter spam detection techniques

    No full text
    In recent years, social networking sites are being referred frequently by the people, due to this the social networking sites are growing very fast. Twitter is one such micro-blogging site where the users are able to connect with new people and know what is happening in the world through the topics discussed on twitter. For this reason, twitter is targeted by malicious users who post harmful links, unwanted messages which are not of users interest which is called, spam. In this paper, a comprehensive survey on existing methods on twitter spam detection is presented. From this survey, it is clear that detecting URL content in the tweet is very important to know whether the tweet is spam or non-spam. The advantages and faws are discussed. The comparative analysis of the existing detection methods are also presented by reviewing research papers published from 2010-2017. There is a lot of scope for the researchers to

    NBLex: emotion prediction in Kannada-English code-switch text using naïve bayes lexicon approach

    No full text
    Emotion analysis is a process of identifying the human emotions derived from the various data sources. Emotions can be expressed either in monolingual text or code-switch text. Emotion prediction can be performed through machine learning (ML), or deep learning (DL), or lexicon-based approach. ML and DL approaches are computationally expensive and require training data. Whereas, the lexicon-based approach does not require any training data and it takes very less time to predict the emotions in comparison with ML and DL. In this paper, we proposed a lexicon-based method called NBLex to predict the emotions associated with Kannada-English code-switch text that no one has addressed till now. We applied the One-vs-Rest approach to generate the scores for lexicon and also to predict the emotions from the code-switch text. The accuracy of the proposed model non-binding lower extremity exoskeleton (NBLex) (87.9%) is better than naïve bayes (NB) (85.8%) and bidirectional long short term memory neural network (BiLSTM) (84.7%) and for true positive rate (TPR), the NBLex (50.6%) is better than NB (37.0%) and BiLSTM (42.2%). From our approach, it is observed that a simple additive model (lexicon approach) can also be an alternative model to predict the emotions in code-switch text

    PMEM: Predicting multiple time series using ensemble model

    No full text
    Forecasting Multiple Time Series (MTS) consists of multiple time series with no relation between them and independent of each other. Predicting each time series independently may lead to increase in time and cost. In this paper, we formalize the problem of predicting the multiple time series together over a MTS database. The proposed framework addresses the following issues. First, it build the initial ensemble model for each time series by using a novel Ensemble approach thereby effectively reduce the data storage and time complexity, secondly, by using single ensemble engine for MTS we perform the three major task, i.e., the task of prediction, building new model, ensemble update and lastly predicting the samples by pattern sequence matching using well known Sliding window method. The computational cost for PMEM is O(C * N * (K+ nmatches)) where, nmatches is the Average number of patterns matched

    Consumer insight mining: Aspect based Twitter opinion mining of mobile phone reviews

    No full text
    Micro-blogging sites such as Twitter are often considered as rich source of opinions of the masses towards products. The character length limit in tweets encourages people to use emojis, emoticons and out of vocabulary words. Due to the huge volume of tweets being generated, it is difficult to manually label tweets and create a supervised learning model for sentiment analysis. Looking into these challenges, the research paper aims to create a feature level sentiment analysis model for Twitter data mining including features such as emoji detection, spelling correction and emoticon detection. The proposed model consists of automated training data labeling by using lexicon based approach. It is an ontology based system with the domain of ``Smartphone''. In addition to the general lexicon used, a set of lexicons specific for each attribute of the domain ``Smartphone'' are used to improve classification accuracy for training data generation. This is used to classify tweets obtained about a particular mobile phone using SVM classifier. Experimental results show that the classifier based on automated training data provides good accuracy. It also demonstrates the importance of emoji detection and the attribute specific lexicons which help improve the classification accuracy. (C) 2017 Elsevier B.V. All rights reserved
    corecore